IRIX 6.2 Development Libraries

home *** CD-ROM | disk | FTP | other *** search

/ IRIX 6.2 Development Libraries / SGI IRIX 6.2 Development Libraries.iso / dist / complib.idb / usr / share / catman / p_man / cat3 / complib / blas.z / blas

Wrap

Text File | 1996-03-14 | 13KB | 331 lines

BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF)))) NNNNAAAAMMMMEEEE BLAS, libblas - Basic Linear Algebra Subprograms DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN BLAS is a library of routines that perform basic operations involving matrices and vectors. They were designed as a way of acheiving efficiency in the solution of linear algebra problems. The BLAS, as they are now commonly called, have been very successful and have been used in a wide range of software, including LINPACK, LAPACK and many of the algorithms published by the ACM Transactions on Mathematical Software. They are an aid to clarity, portability, modularity and maintenance of software, and have become the de facto standard for elementary vector and matrix operations. The BLAS promote modularity by identifying frequently ocurring operations of linear algabra and by specifying a standard interface to these operations. Efficiency is achieved through optimization within the BLAS without altering the higher-level code that has referenced them. There are three levels of BLAS. The original set of BLAS, commonly refered as the Level 1 BLAS, perform low-level operations such as dot- product and the adding of a multiple of one vector to another. Typically these operations involve O(N) floating point operations and O(N) data items moved (loaded or stored), where N is the length of the vectors. The Level 1 BLAS permit efficient implementation on scalar machines, but the ratio of floating-point operations to data movement is too low to achieve effective use of most vector or parallel hardware. The Level 2 BLAS perform Matrix-Vector operations that occur frequently in the implementation of mant of the most common linear algebra algorithms. They involve O(N^2) floating point operations. Algorithms that use Level 2 BLAS can be very efficient on vector computers, but are not well suited to computers with a hierarchy of memory (such as cache memory). The Level 3 BLAS are targeted at matrix-matrix operations. These operations generally involve O(N^3) floating point operations, while only creating O(N^2) data movement. These operations permit efficient reuse of data that resides in cache and create waht is often called the surface- to-volumne effect for the ratio of computations to data movement. In addition, matrices can be partitioned into blocks, and operations on distinct blocks can be performed in parallel, and within the operations on each block, scalar or vector operations may be performed in parallel. BLAS2 and BLAS3 modules have been optimized and parallelized to take advantage of SGI's RISC parallel architecture. The best performances are achieved for BLAS3 routines (e.g. DGEMM), where "outer-loop" unrolling + "blocking" techniques were applied to take advantage of the memory cache. The performance of BLAS2 routines (e.g. DGEMV) is sensitive to the size of the problem, for large sizes the high rate of cache miss slows down the algorithms. LAPACK algorithms use preferably BLAS3 modules and are the most PPPPaaaaggggeeee 1111 BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF)))) efficient. LINPACK uses only BLAS1 modules and therefore is less efficient than LAPACK. To link with "libblas", it is advised to use "f77" to load all the Fortran Libraries required. For Power Challenge, you should use the mips4 version. This is accomplished by using ----mmmmiiiippppssss4444 when linking: ffff77777777 ----mmmmiiiippppssss4444 -o foobar.out foo.o bar.o ----llllbbbbllllaaaassss To use the parallelized version, use ffff77777777 ----mmmmiiiippppssss4444 -o foobar.out foo.o bar.o ----llllbbbbllllaaaassss____mmmmpppp SSSSUUUUMMMMMMMMAAAARRRRYYYY BBBBLLLLAAAASSSS LLLLeeeevvvveeeellll 1111:::: .....function...... ....prefix,suffix..... rootname dot product s- d- c-u c-c z-u z-c -dot- y = a*x + y s- d- c- z- -axpy setup Givens rotation s- d- -rotg apply Givens rotation s- d- cs- zd- -rot copy x into y s- d- c- z- -copy swap x and y s- d- c- z- -swap Euclidean norm s- d- sc- dz- -nrm2 sum of absolute values s- d- sc- dz- -asum x = a*x s- d- cs- c- zd- z- -scal index of max abs value is- id- ic- iz- -amax BBBBLLLLAAAASSSS LLLLeeeevvvveeeellll 2222:::: MV Matrix vector multiply R Rank one update to a matrix R2 Rank two update to a matrix SV Solving certain triangular matrix problems. single precision Level 2 BLAS | Double precision Level 2 BLAS ----------------------------------------------------------------------- MV R R2 SV | MV R R2 SV SGE x x | DGE x x SGB x | DGB x SSP x x x | DSP x x x SSY x x x | DSY x x x SSB x | DSB x STR x x | DTR x x STB x x | DTB x x STP x x | DTP x x complex Level 2 BLAS | Double precision complex Level 2 BLAS ----------------------------------------------------------------------- MV R RC RU R2 SV| MV R RC RU R2 SV CGE x x x | ZGE x x x CGB x | ZGB x CHE x x x | ZHE x x x CHP x x x | ZHP x x x CHB x | ZHB x CTR x x | ZTR x x PPPPaaaaggggeeee 2222 BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF)))) CTB x x | ZTB x x CTP x x | ZTP x x BBBBLLLLAAAASSSS LLLLeeeevvvveeeellll 3333:::: MM Matrix matrix multiply RK Rank-k update to a matrix R2K Rank-2k update to a matrix SM Solving triangular matrix with many right-hand-sides. single precision Level 3 BLAS | Double precision Level 3 BLAS ----------------------------------------------------------------------- MM RK R2K SM | MM RK R2K SM SGE x | DGE x SSY x x x | DSY x x x STR x x | DTR x x complex Level 3 BLAS | Double precision complex Level 3 BLAS ----------------------------------------------------------------------- MM RK R2K SM | MM RK R2K SM CGE x | ZGE x CSY x x x | ZSY x x x CHE x x x | ZHE x x x CTR x x | ZTR x x CCCC IIIINNNNTTTTEEEERRRRFFFFAAAACCCCEEEE There is a C interface for the BLAS library. The implementation is based on the proposed specification for BLAS routines in C [1]. The argument lists follow closely the equivalent Fortran ones. The main changes being that enumeration types are used instead of character types for option specification, and two dimensional arrays are stored in one dimensional C arrays in an analogous fashion as a Fortran array (column major). Therfore, a matrix A would be stored as: double (*a)[lda*n]; /* */ /* aaaa is a pointer to an array of size ttttddddaaaa****nnnn */ /* */ where element A(i+1,j) of matrix A is stored immediately after the element A(i,j), while A(i,j+1) is lda elements apart from A(i,j). The element A(i,j) of the matrix can be accessed directly by reference to a[ (j-1)*lda + (i-1) ]. The names of the C versions of the BLAS are the same as the Fortran versions since the compiler puts the Fortran names in upper case and adds an underscore after the name. The argument lists use the following data types: Integer: an integer data type of 32 bits. float: the regular single precision floating-point type. PPPPaaaaggggeeee 3333 BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF)))) double: the regular double precision floating-point type. Complex: a single precision complex type. Zomplex: a double precision complex type. plus the enumeration types given by typedef enum { NoTranspose, Transpose, ConjugateTranspose } MatrixTranspose; typedef enum { UpperTriangle, LowerTriangle } MatrixTriangle; typedef enum { UnitTriangular, NotUnitTriangular } MatrixUnitTriangular; typedef enum { LeftSide, RightSide } OperationSide; The complex data types are stored in cartisian form, i.e., as real and imaginary parts. For example typedef struct { float real; float imag; } Complex; typedef struct { double real; double imag; } Zomplex; The operations performed by the C BLAS are identical to those performed by the corresponding Fortran BLAS, as specified in [2], [3] and [4]. To use the C BLAS, link with "libblas". It is advised to use "f77" to load all the Fortran Libraries required: ffff77777777 -o foobar.out foo.o bar.o ----llllbbbbllllaaaassss FFFFIIIILLLLEEEESSSS /usr/lib/libblas.a /usr/lib/libblas_mp.a /usr/include/cblas.h OOOORRRRIIIIGGGGIIIINNNN The original Fortran source code comes from netlib. RRRREEEEFFFFEEEERRRREEEENNNNCCCCEEEESSSS S.P. Datardina, J.J. Du Croz, S.J. Hammrling and M.W. Pont, "A Proposed Specification of BLAS Routines in C", NAG Technical Report TR6/90. C Lawson, R. Hanson, D. Kincaid, and F. Krough, "Basic Linear Algebra Subprograms for Fortran usage ", ACM Trans. on Math. Soft. 5(1979) 308-325 PPPPaaaaggggeeee 4444 BBBBLLLLAAAASSSS((((3333FFFF)))) BBBBLLLLAAAASSSS((((3333FFFF)))) J.Dongarra, J.DuCroz, S.Hammarling, and R.Hanson, "An extended set of Fortran Basic Linear Algebra Subprograms", ACM Trans. on Math. Soft. 14, 1(1988) 1-32 J.Dongarra, J.DuCroz, I.Duff,and S.Hammarling, "An set of level 3 Basic Algebra Subprograms", ACM Trans on Math Soft( Dec 1989) PPPPaaaaggggeeee 5555